Ensemble Estimation of Distributional Functionals via $k$-Nearest Neighbors
نویسندگان
چکیده
The problem of accurate nonparametric estimation of distributional functionals (integral functionals of one or more probability distributions) has received recent interest due to their wide applicability in signal processing, information theory, machine learning, and statistics. In particular, k-nearest neighbor (nn) based methods have received a lot of attention due to their adaptive nature and their relatively low computational complexity. We derive the mean squared error (MSE) convergence rates of leave-one-out k-nn plug-in density estimators of a large class of distributional functionals without boundary correction. We then apply the theory of optimally weighted ensemble estimation to obtain weighted ensemble estimators that achieve the parametric MSE rate under assumptions that are competitive with the state of the art. The asymptotic distributions of these estimators, which are unknown for all other k-nn based distributional functional estimators, are also presented which enables us to perform hypothesis testing.
منابع مشابه
Direct Ensemble Estimation of Density Functionals
Estimating density functionals of analog sources is an important problem in statistical signal processing and information theory. Traditionally, estimating these quantities requires either making parametric assumptions about the underlying distributions or using non-parametric density estimation followed by integration. In this paper we introduce a direct nonparametric approach which bypasses t...
متن کاملOptimal rates for k-NN density and mode estimation
We present two related contributions of independent interest: (1) high-probability finite sample rates for k-NN density estimation, and (2) practical mode estimators – based on k-NN – which attain minimax-optimal rates under surprisingly general distributional conditions.
متن کاملA Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors
Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...
متن کاملExploring the neighbor graph to improve distributional thesauri (Explorer le graphe de voisinage pour améliorer les thésaurus distributionnels) [in French]
In this paper, we address the issue of building and improving a distributional thesaurus. We first show that existing tools from the information retrieval domain can be directly used in order to build a thesaurus with state-of-the-art performance. Secondly, we focus more specifically on improving the obtained thesaurus, seen as a graph of k-nearest neighbors. By exploiting information about the...
متن کاملImproving distributional thesauri by exploring the graph of neighbors
In this paper, we address the issue of building and improving a distributional thesaurus. We first show that existing tools from the information retrieval domain can be directly used in order to build a thesaurus with state-of-the-art performance. Secondly, we focus more specifically on improving the obtained thesaurus, seen as a graph of k-nearest neighbors. By exploiting information about the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1707.03083 شماره
صفحات -
تاریخ انتشار 2017